An Efficient Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats
نویسندگان
چکیده
Tensors, which are the linear-algebraic extensions of matrices in arbitrary dimensions, have numerous applications to data processing tasks in computer science and computational science. Many tensors used in diverse application domains are sparse, typically containing more than 90% zero entries. Efficient computation with sparse tensors hinges on algorithms that can leverage the sparsity to do less work, but the irregular locations of the nonzero entries pose significant challenges to performance engineers. Many tensor operations such as tensor-vector multiplications can be sped up substantially by breaking the tensor into equally sized blocks (only storing blocks which contain nonzeros) and performing operations in each block using carefully tuned code. However, selecting the best block size from among many possibilities is computationally challenging. Previously, Vuduc et al. defined the fill of a sparse tensor to be the number of stored entries in the blocked format divided by the number of nonzero entries, and showed how the fill can be used as part of an effective, efficient heuristic for evaluating the quality of a particular blocking scheme [1, 2]. In particular, they showed that if the fill could be computed exactly, then the measured performance of their sparse matrix-vector multiply was within 5% of the optimal setting. However, they gave no theoretical accuracy bounds for their method for estimating the fill, and it is vulnerable to several classes of adversarial examples. In this paper, we present a sampling-based method for finding a (1 + ε)-approximation to the fill of an order N tensor for all block sizes less than B, with probability at least 1 − δ, using O(NB log(B/δ)/ε) samples for each block size. We introduce an efficient routine to sample for all B block sizes at once in O(NB ) time. We extend our concentration bounds to a more efficient bound based on sampling without replacement, using the recent Hoeffding-Serfling inequality [3]. We then implement our algorithm and evaluate it on sparse matrices from the University of Florida collection and compare our scheme to that of Vuduc, as implemented in the Optimized Sparse Kernel Interface (OSKI) library, and a brute-force method for obtaining the ground truth. We find that our algorithm provides faster estimates of the fill at all accuracy levels, providing evidence that this is both a theoretical and practical improvement.
منابع مشابه
Tree-based Space Efficient Formats for Storing the Structure of Sparse Matrices
Sparse storage formats describe a way how sparse matrices are stored in a computer memory. Extensive research has been conducted about these formats in the context of performance optimization of the sparse matrix-vector multiplication algorithms, but memory efficient formats for storing sparse matrices are still under development, since the commonly used storage formats (like COO or CSR) are no...
متن کاملImplementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
We discuss implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs. We outline an algorithm and various optimizations, and identify potential future improvements and challenging tasks. In comparison with previously published implementation, our implementation is faster on matrices having many high fill-ratio blocks but slower on matrices with low number of non-zero elements per...
متن کامل"Compress and eliminate" solver for symmetric positive definite sparse matrices
We propose a new approximate factorization for solving linear systems with symmetric positive definite sparse matrices. In a nutshell the algorithm is to apply hierarchically block Gaussian elimination and additionally compress the fill-in. The systems that have efficient compression of the fill-in mostly arise from discretization of partial differential equations. We show that the resulting fa...
متن کاملAutomatic Construction of Explicit R Matrices for the One-Parameter Families of Irreducible Typical Highest Weight (0̇m|α̇n) Representations of Uq[gl(m|n)]
We detail the automatic construction of R matrices corresponding to (the tensor products of) the (0̇m|α̇n) families of highest-weight representations of the quantum superalgebras Uq[gl(m|n)]. These representations are irreducible, contain a free complex parameter α, and are 2 dimensional. Our R matrices are actually (sparse) rank 4 tensors, containing a total of 2 components, each of which is in ...
متن کاملShrinkage Tuning Parameter Selection in Precision Matrices Estimation
Recent literature provides many computational and modeling approaches for covariance matrices estimation in a penalized Gaussian graphical models but relatively little study has been carried out on the choice of the tuning parameter. This paper tries to fill this gap by focusing on the problem of shrinkage parameter selection when estimating sparse precision matrices using the penalized likelih...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017